\documentclass[12pt]{article}

\usepackage{graphicx}      % Enable graphics commands
\usepackage{lscape}    	% Enable landscape with \begin{landscape} until \end{landscape}
\usepackage[section]{placeins} % Keep tables and figures within their own sections
\usepackage{natbib}			% Enable citation commands \citep{}, \citet{}, etc.
\bibpunct{(}{)}{;}{a}{}{,}		% Formatting for in-text citations
\usepackage{setspace}		% Enable double-spacing with \begin{spacing}{2} until \end{spacing}.
\usepackage[utf8]{inputenc} 	% Enable utf8 characters, i.e., accents without coding--just type them in.
\usepackage[english]{babel}	% English hyphenation and alphabetization.  Other languages available.
\usepackage{dcolumn}        % For decimal-aligned stargazer output.
\usepackage[colorlinks=true, urlcolor=blue, citecolor=black, linkcolor=black]{hyperref} % Include hyperlinks with the \url and \href commands.
\setlength{\tabcolsep}{1pt}	% Make tables slightly narrower by reducing space between columns.

\renewcommand\floatpagefraction{.9}	% These commands allow larger tables and graphics to fit
\renewcommand\topfraction{.9}		% on a page when default settings would complain.
\renewcommand\bottomfraction{.9}
\renewcommand\textfraction{.1}
\setcounter{totalnumber}{50}
\setcounter{topnumber}{50}
\setcounter{bottomnumber}{50}

\newcommand{\R}{\textsf{R}~}        %This creates the command \R to typeset the name R correctly.

%\usepackage[left=1in, right=1in]{geometry}	%Turn footnotes into endnotes (commented out).
%\renewcommand{\footnotesize}{\normalsize}	
%\usepackage{endnotes}
%\renewcommand{\footnote}{\endnote}
%\renewcommand{\section}{\subsection}
%\usepackage{fullpage}

\begin{document}


\title{On the Assessment and Use of\\ Cross-National Income Inequality Datasets}		
\author{
    Frederick Solt\\
    \href{mailto:frederick-solt@uiowa.edu}{frederick-solt@uiowa.edu}
}
\date{\today}				
\maketitle

\begin{abstract}
Researchers should ensure the data they employ are fit for their purpose, and they should maximize the quality of the data they choose.  In this paper, I review how this advice applies to broadly cross-national research on income inequality.  I demonstrate that the guidance offered in \citet{Jenkins2015} to those pursuing cross-national research runs completely counter to the recommendations found in \citet{Atkinson2001, Atkinson2009}, the source of the aforementioned advice and the works upon which \citet{Jenkins2015} claims its own is based.  I then show how the Standardized World Income Inequality Database (SWIID) incorporates Atkinson and Brandolini's recommendations to provide the most comparable data available for those engaged in broadly cross-national research on income inequality.
\end{abstract}

The assessment of the Standardized World Income Inequality Database (SWIID) offered in this issue \citep{Jenkins2015} suffers from bad timing.  It was written over the same period that a major revision of the SWIID and a new article of record for the dataset were being prepared.  Although I was able to alert its author that on several important points the work's observations had already been superseded, most notably regarding top income shares, he justifiably concluded that the piece should critique the dataset as it was then available.  (Fortunately, he could and did incorporate my advice that simply ignoring the SWIID's documentation and the uncertainty associated with the its estimates was not justifiable.)  This has an entirely understandable but undeniably regrettable consequence: the bulk of the details in the paper's description and assessment of Version 4.0 of the SWIID simply do not accurately describe Version 5.0, released at the start of October 2014, or even Version 4.1 that preceded it.  

Still more regrettable---and unrelated to its timing---is that there is a great deal in the paper that does not even accurately describe Version 4.0.\footnote{A particularly egregious---and obvious---example is the unsupported assertion that the SWIID imputes \emph{all} observations \citep[27]{Jenkins2015}, when in fact it imputes only observations for which no data is available from the LIS (see \citealt{Solt2009} and forthcoming).  Making the assertion even more bizarre is the later acknowledgement that the LIS data is perfectly reproduced in the SWIID \citep[31]{Jenkins2015}.  One possible explanation is that its author, like many other producers of inequality statistics from microdata, overlooked the fact that such statistics have associated measurement error due to being based on survey samples rather than the actual population of interest.  It is this measurement error, not multiple imputation, that accounts for the variation across the simulated series that make up the SWIID in the observations for which LIS data are available \citep[see][10-11]{Solt2014}.  It is too bad that the authors of the datasets reviewed in this issue were not invited to participate in the project from the outset.  It is likely that, if they had, this and many of the other errors that mar \citet{Jenkins2015} and the other assessments would have been avoided.}  A point-by-point correction of the many erroneous statements regarding the SWIID found in the now outdated paper, however, would serve little purpose; readers should simply refer to the SWIID's article of record \citep{Solt2014} for details on the current dataset and its construction.  

Here, I take up instead the recommendations the paper offers to those pursuing cross-national research on the causes and consequences of income inequality.  Given its author's admitted unfamiliarity with cross-national income inequality datasets \citep[3]{Jenkins2015} and his inexperience conducting cross-national research on income inequality (none of his many and deservedly influential publications fall into any of the three categories of studies that employ cross-national inequality datasets listed at pp.2-3), it is hardly surprising that the task of providing an assessment of cross-national income inequality datasets and how they should be employed proved to be a difficult remit.  The advice for researchers that the paper repeats from \citet{Atkinson2001, Atkinson2009} is indeed excellent.  On the other hand, the paper's own guidance---though undoubtedly well intentioned---is badly mistaken.  When pernicious recommendations are so thoroughly intermingled with sound advice they only become more dangerous.  In what follows, I do my best to thresh the chaff from the wheat.


\section*{Some \emph{Very} Bad Advice}
The course the paper advocates for cross-national researchers begins with selecting data from the somewhat dated \citet{UNU2008} dataset, which collects and presents information on income and consumption inequality from a wide array of sources available at the time of its release.\footnote{Here too the timing was unfortunate, as this dataset has since begun to be revised and brought up to date \citep[see][]{UNU2014}.}  Researchers should then `systematically employ some sort of selection ``algorithm''\thinspace' \citep[14]{Jenkins2015}, preferably one that includes discarding all but those observations rated 1 on the 1 to 4 \emph{Quality} variable included in the dataset \citep[11-12]{Jenkins2015}, by carefully investigating the documentation to choose the best observation for a particular country-year from among the multiple possibilities presented by the \citet{UNU2008} dataset.  The piece justifies this approach by quoting \citet[399]{Atkinson2009}, `[o]ne has to \emph{look at the data}, exercising judgment as to whether they are fit for purpose. Data quality \emph{does} matter' \citep[38]{Jenkins2015}.

But the piece's advocated approach does serious violence to the meaning of the quoted passage.  First, looking at the data and `exercising judgment as to whether they are fit for purpose' in the original work does not refer to picking and choosing individual observations but rather to ensuring a match between theory and measure.  One should not unreflectively employ whatever data may be most readily available, \citet[389]{Atkinson2009} explain: one must carefully consider whether the relevant variable is inequality in post-tax, post-transfer net income, as with theories linking inequality and growth, or whether it is instead pre-tax, pre-transfer market income, as when considering median-voter theories of taxation.  The data employed must be fit for the theory examined.  This is a fundamental tenet of research design that is too often neglected.

Second, `data quality' in the quotation does not concern the quality ratings found in the \cite{UNU2008} dataset and its predecesors.  \citet[790]{Atkinson2001} specifically argue against using such quality assessments as a criterion (as does the dataset's user guide, see \citealt[15]{UNU2008a}). The \emph{Quality} variable describes mainly the quality of the \emph{documentation} of an observation; this in turn has much more to do with the timing and location of the underlying survey than with the validity of its data.  What actually `\emph{does} matter,' according to the quoted article, is avoiding the abrupt `breaks in consistency' that occur in series composed of data drawn from different sources and calculated on different bases and the `breaks in continuity' that occur even in series from a single source when that source's underlying methodology changes \citep[389-393]{Atkinson2009}.

That the paper's advice completely contradicts that offered in the works of \citet{Atkinson2001, Atkinson2009} is most easily seen in the context of its own example, an exploration of the relationship between unemployment, inflation, and income inequality within OECD countries since 1980.  The process of arriving at a dataset for analysis (1) starts with the \citet{UNU2008} data, using only those with the highest \emph{Quality} scores.\footnote{Although this first screen in the selection algorithm is left implicit, I confirmed it in the replication materials.  Although these materials are not made available online, they can be procured via request to the author \citep[i]{Jenkins2015}.}  As already mentioned, \citet[790]{Atkinson2001} counsel \emph{against} selecting observations on these grounds.  

The next step (2) retains those observations `in which the income definition referred to gross or disposable income' (p.36).  \citet{Atkinson2001} point to two problems with the practice of mixing observations calculated on differing bases such as pre-tax gross income and post-tax disposable income.  The first problem they note is that the differences between `gross and disposable income can \ldots be substantial' and `[d]ifferences across countries, and across time, are to be expected as a result of differences in government fiscal policies and in tax incidence' \citep[788, 789]{Atkinson2001}.  As a result, mixing the two fails to meet their standard of generating `a data-set in which the observations are as fully consistent as possible' \citep[790]{Atkinson2001}.  The second problem arises from the fact that one should expect taxation and the progressivity thereof to influence the relationships under examination: mixing gross- and disposible-income observations therefore suggests a simple failure to consider what is the theoretically relevant variable, a step that \citet[389]{Atkinson2009} remind us is crucial.  

Finally, the selection procedes by (3) inspecting the remaining sources for each country, employing judgment to choose among them when multiple observations for a given country-year are available.  `Wherever possible,' the author writes, `I chose for each country the ones providing the longest series according to a particular definition' \citep[36]{Jenkins2015}.  This fails to correspond to Atkinson and Brandolini's exhortation to ensure that data do not exhibit breaks in consistency and continuity.  Those authors maintain that `where two consecutive observations of the Gini index differ by four percentage points or more' investigation is `required' \citep[392]{Atkinson2009}.  Such events occur in two of the twenty series obtained.  There is an increase of 6.1 points in Belgium between 1992 and 1995, which is largely (if not completely) mirrored in other available data, and an even sharper increase of 4.1 points in a single year in Italy, which is not. `Looking at the data' reveals that the 4.1 Gini-point difference between the European Commission's data for 2001 and Brandolini's (\citeyear{Brandolini2004}) data for 2002 is clearly a break in consistency rather than an actual rise in income inequality.

In short, the advice---and the example offered as a model for researchers to follow---provided by \citet{Jenkins2015} is very nearly opposite to that found in \citet{Atkinson2001, Atkinson2009}.  I would further add that this advice also flies in the face of the concerns raised by the ongoing crisis of replication roiling the social sciences and beyond.\footnote{\citet{Atkinson2001, Atkinson2009}, of course, were written well before the eruption of this crisis (their later work was accepted for publication on January 3, 2006), and so these concerns played no role in their suggestions.}  The exercise of researcher judgment on which observations to include in an analysis, even when this judgment is ostensibly delimited by explicit conditions, has been the cause of major errors with severe real-world consequences (see, e.g., the documentation by \citealt{Herndon2014} of the problematic data selection in \citealt{Reinhart2010}).  Relying on judgment creates additional researcher degrees of freedom \citep{Simmons2011} that make analyses less trustworthy even in the complete absence of `fishing expeditions' and `$p$-hacking' aimed at finding a particular result \citep{Gelman2014}.  This, too, is made clear by reference to the example put forward as a model for researchers in \citet{Jenkins2015}.

Why opt to use, for the United States, only the five observations of disposable-income inequality in the UNU data from the LIS (1986, 1991, 1994, 1997, 2000) rather than the complete 17-year-long disposable-income inequality series provided by \citet{Brandolini1998} that is also included in the dataset?  There's no hint in the terms of the declared algorithm: one can only conclude that it must have been a matter of judgment that choosing the longest series was not possible in this case.  The sample also excludes the still longer series from the U.S. Census Bureau, which is perhaps justifiable in terms of the algorithm on the grounds that the welfare definition employed was `Monetary Income, Gross' rather than `Income, Gross' (a distinction that makes very little difference in the U.S. case, as observations from the LIS using both definitions included in the \citet{UNU2008} dataset confirm).  Lest one think the U.S. case an abberation, the sample excludes \emph{all but one} observation for Australia.\footnote{This is the origin of my earlier reference to 20 series though 21 countries are included in this analyses.  There could be no break in the data for Australia as it consists of only a single observation.}  The \citet{UNU2008} dataset includes 35 observations with complete geographic and population coverage---and top \emph{Quality} scores, no less---for Australia in the time period examined, including a series calculated on the same basis by the Australian Bureau of Statistics in eight of ten years from 1995 to 2004.  This series may have been discarded because the welfare definition was `Monetary Income, Disposable' rather than `Income, Disposable,' but in this case too this is a distinction with little difference.  Moreover, these decisions are completely invisible to readers (and to reviewers, had the piece been subject to peer review); they only come to light when one is carefully perusing both the original dataset and Jenkins' replication materials.  One could point to still other questionable decisions made in this application of judgment, e.g., regarding Norway. This underscores my point: even if the possibility occurs to them and the necessary replication materials have been made public, neither reviewers nor readers are likely to have the time and inclination to wade through all of the data and confirm that every decision an author makes in applying his or her own judgment is in fact defensible.

Not surprisingly, how one exercises one's judgment affects the results reached.  Table~\ref{T1} presents replications of Models 3 and 4 of Table 4---that is, the models that employ the hand-selected WIID data and cluster-robust standard errors by country---along with additional models that replace the data selected for the United States and Australia in that piece with the longer \citet{Brandolini1998} and ABS series.  The statistically significant results found in Model 3 disappear in Model 3A, while the estimated coefficient for unemployment that is not statistically signficant in Model 4 becomes statistically significant in Model 4A.  In other words: the advice offered in \citet{Jenkins2015} makes possible virtually any result, and so it makes all results untrustworthy \citep[see][]{Simmons2011}.

<<tab4, echo=FALSE, results='hide', warning=FALSE, message=FALSE>>==
ipak <- function(pkg){
    new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
    if (length(new.pkg)) 
        install.packages(new.pkg, dependencies = TRUE)
    sapply(pkg, require, character.only = TRUE)
}

packages <- c("apsrtable", "foreign", "sandwich", "lmtest")
ipak(packages)

robust.se <- function(model, cluster){
 require(sandwich)
 require(lmtest)
 M <- length(unique(cluster))
 N <- length(cluster)
 K <- model$rank
 dfc <- (M/(M - 1)) * ((N - 1)/(N - K))
 uj <- apply(estfun(model), 2, function(x) tapply(x, cluster, sum));
 rcse.cov <- dfc * sandwich(model, meat = crossprod(uj)/N)
 rcse.se <- coeftest(model, rcse.cov)
 return(list(rcse.cov, rcse.se))
}

j.data <- read.dta("j4.dta") # dataset provided by Jenkins
j.data1 <- droplevels(j.data[j.data$pickreg==1,])  # pickreg is Jenkins' sample
j.data2 <- droplevels(j.data[j.data$pickreg2==1,]) # pickreg2 substitutes Brandolini and ABS series

m3 <- lm(ReportedGini ~ inflation + unemp + time, data=j.data1)
m3$se <- robust.se(m3, j.data1$Country)[[1]]

m4 <- lm(ReportedGini ~ inflation + unemp + time + incsharu2 + equivsc, data=j.data1)
m4$se <- robust.se(m4, j.data1$Country)[[1]]
m4$coefficients <- m4$coefficients[-12] # to deal with indicators dropped for multicolinearity
m4$coef <- m4$coef[-12]
m4$se <- m4$se[-12,-12]

m3a <- lm(ReportedGini ~ inflation + unemp + time, data=j.data2)
m3a$se <- robust.se(m3a, j.data2$Country)[[1]]

m4a <- lm(ReportedGini ~ inflation + unemp + time + incsharu2 + equivsc, data=j.data2)
m4a$se <- robust.se(m4a, j.data2$Country)[[1]]
m4a$coefficients <- m4a$coefficients[-13]
m4a$coef <- m4a$coef[-13]
m4a$se <- m4a$se[-13,-13]
@

\begin{table}[htbp] 
\caption{A Paradise for $p$-Hackers}
\label{T1}
<<label=Table4_Replication, results='asis', echo=FALSE>>=
mn <- c("Model 3", "Model 3A", "Model 4", "Model 4A")
cn <- c("Inflation", "Unemployment", "Time")
t1 <- apsrtable(m3, m3a, m4, m4a, digits=3, model.names=mn ,stars="default", notes="", omitcoef= c(1, 5:12), col.hspace=".5in", align="left", coef.names=cn, Sweave=TRUE)
t1 <- gsub("\\^\\\\dagger", "", t1)
t1 <- gsub("Re.*", "", t1)
t1 <- gsub("ad.*","", t1)
t1[5] <- gsub("(.*)\\n(.*)\\n(.*)", "DV Adjustment \\\\qquad & no & no & yes & yes \\\\\\\\ \n \\\\\\\\ \n\\1 \n\\2 \n", t1[5])
t1[6] <- "\\hline\n"
t1[7] <- ""
t1
@
\begin{footnotesize}
\begin{tabular}{p{5.1in}}
\emph{Notes}: $^* p<.05$, $^{**} p<.01$, $^{***} p<.001$; OLS estimates with country-level cluster-robust standard errors.  Models 3 and 4 replicate those models in Jenkins (2015), Table 4; Model~4 includes `dummy variable adjustments,' discussed further in the text below.  Models 3A and 4A are identical but replace the data selected for the United States and Australia with the longer Brandolini (1998) and Australian Bureau of Statistics series available for those countries (see text).  The statistically significant results found in Model~3 disappear in Model 3A, while the estimated coefficient for unemployment that is not statistically signficant in Model 4 becomes statistically significant in Model 4A.
\end{tabular}
\end{footnotesize}
\end{table}

I turn finally to one last bit of ill-considered advice the paper offers as a `complementary' approach for cross-national researchers: the use of so-called `dummy variable adjustments' to account for the incomparability in the Gini indices algorithmically selected from the \citet{UNU2008} dataset  in accordance with its other recommendations \citep[38]{Jenkins2015}.  The piece reiterates (and indeed references as if original) the point that simply including dummy variables for each of the different welfare definitions and equivalence scales upon which these statistics were calculated makes the implausible assumption that the differences between these various bases are constant across countries and over time (see, e.g., \citealt[788-790]{Atkinson2001}; \citealt[388]{Atkinson2009}; \citealt[233]{Solt2009}; \citealt[4-6]{Solt2014}).  In light of this, the piece advises that researchers `should use a number of carefully-defined interaction variables to account for variations in Gini differences across time and space' \citep[38]{Jenkins2015}.  It offers no further guidance for how one might actually implement this last vague recommendation.  This is just as well: such dummy variables, with or without `carefully-defined' interaction terms, introduce bias by comparing the difference between definitions \emph{across different country-years} \citep[see][788]{Atkinson2001}.  This is easily seen by imagining two countries, Ruritania and Megalomania.  Megalomania has high income inequality and only measures of gross income inequality; Ruritania has low income inequality and only measures of net income inequality.  A dummy variable identifying those observations calculated using gross income will capture not only the difference between inequality in gross and net income but also the difference in the level of inequality between  Ruritania and Megalomania.  Here, again, the recommendations of \citet{Jenkins2015} contradict the advice of \citet[788]{Atkinson2001}, who pointed out, `A more satisfactory procedure \ldots compares \emph{paired} estimates, i.e, when both gross and net incomes are available for \emph{the same country at the same date}.'  One simply should not proceed in the manner recommended in \citet{Jenkins2015}.  Its guidance is wrong at every step.


\section*{Good Advice, and Its Application}
I turn now to the \emph{actual} recommendations provided by \citet{Atkinson2001, Atkinson2009} and describe how the SWIID incorporates them.  \citet[389]{Atkinson2009} enjoin researchers to begin by looking at the data they employ to be sure that it is fit for their purposes, that is, to first identify the relevant variable for their theory rather than allowing data availability to decide for them.  The SWIID includes estimates of the Gini index of net-income inequality and of market-income inequality, as well as estimates of absolute redistribution (the difference between those two income-inequality measures) and relative redistribution (the proportional reduction in market-income inequality), thereby meeting the needs of most researchers \citep[see][21]{Solt2014}.\footnote{Most, but certainly not all: those whose theories speak directly to income shares or ratios have no alternative but to instead look to the quintile and decile data provided by \citet{UNU2014}, carefully selecting only those statistics based on the same welfare definition and equivalence scale.  Despite appearances to the contrary, the World Top Incomes Database \citep{Alvaredo2014} does not provide cross-nationally comparable data for this purpose; substantial differences across countries in the definitions of taxable income and the tax unit as well as in the prevalence of tax avoidance and evasion across incomes led that dataset's compilers to use it only to compare trends over time across countries, not levels \citep[see][4-5 and passim]{Atkinson2011}.}  The sample to be analyzed should likewise be chosen to match the universe implied by theory as closely as possible rather than be dictated by data availability.  The SWIID provides the broadest possible sample of countries and years---more than twice as many observations as in the next largest dataset---so as to provide data for testing theories of wide scope \citep[see][13]{Solt2014}; regional and national sources may be available for tests of theories of more narrow application.\footnote{I note, however, that regional sources are often surprisingly insufficient, even for fairly recent years in data-rich parts of the world.  For example, using only Eurostat data to provide information about the context of inequality in which the first four waves of the European Social Survey were conducted (from 2002 to 2009) would result in missing data for more than one-fourth of the country-years in the sample \citep[see][]{Solt2015}.}

To maximize what \citet{Atkinson2009} refer to as data quality, the SWIID is designed to avoid `breaks in consistency' that occur when series are drawn from different sources and `breaks in continuity' that occur with changes of methodology within the same source.  Its first step in addressing these issues is to use \emph{all} of the available data with full geographic and population coverage and for which the welfare definition and equivalence scale are known---drawing on the \citet{UNU2008} dataset, regional sources, national statistical offices, and the academic literature.  This follows Atkinson and Brandolini's (\citeyear[790]{Atkinson2001}) observation---echoed by \citealt[15]{UNU2008a}---that the quality of documentation should not to be used to select data; it also eliminates problematic researcher degrees of freedom \citep[cf.][]{Simmons2011}.  It uses these data to estimate missing values in inequality data available from the Luxembourg Income Study \citep{LIS2014, LIS2014a}, the source of the most comparable cross-national income inequality data available.

To avoid the biases that result from the use of dummy variables, \citet[788]{Atkinson2001} direct that the adjustments between data using different welfare definitions and equivalence scales should be calculated from comparisons of observations that differ in these ways but describe the same country-year.  The SWIID's estimates of observations missing from the LIS are therefore based on adjustments calculated only from data that share a country and year \citep[see][9]{Solt2014}.  Suppose, for example, that data on Ruritania are available from both the LIS Key Figures and the Ruritanian National Statistics Bureau (RNSB) in 2010 but only from the RNSB in 2011.  The SWIID's adjustment to the 2011 RNSB data would be based on the relationship between the LIS and RNSB data in 2010, not on a comparison of the 2011 RNSB data to the 2010 LIS data.

\citet[389]{Atkinson2009} point out that the theoretical relationship between Gini indices computed on the basis of varying welfare definitions suggests these adjustments should be multiplicative rather than additive, so the SWIID adjustments are calculated as ratios rather than differences.  Because, as \citet[790]{Atkinson2001} observe, welfare definitions and equivalence scales interact, the SWIID's adjustments are calculated for each distinct \emph{combination} of definition and scale \citep[see][8-9]{Solt2014}. 

Atkinson and Brandolini (2001, 788-790; 2009, 388) document that differences between these welfare definitions and equivalence scales vary across countries and over time and therefore advise that global fixed adjustments are insufficient for making comparable observations that were calculated on varying bases.  Therefore, the SWIID completely avoids making such global fixed adjustments for these differences.  Rather, its adjustments vary over space and time as much as possible given the available data (see \citealt[9-10]{Solt2014} for details).  Despite the fact that the SWIID includes a great many observations for relatively data-poor parts of the world, over half of its observations are based on adjustments calculated from available data within the same country as the observation, and more than a third vary over time as well; among more advanced countries, these proportions are considerably higher \citep[see][14-15]{Solt2014}.  Adjustments that encompass more time and space are consequently less certain, and indeed the uncertainty can be considerable in relatively data-poor countries.  The SWIID therefore incorporates this uncertainty---and the uncertainty due to sampling error present in even the very high quality data provided by the LIS---to ensure that researchers draw correct inferences, and it is pre-formatted for use with the tools available in both Stata and \textsf{R} for data measured with error.  Finally, to take into account Atkinson and Brandolini's (\citeyear[393]{Atkinson2009}) point that `income distribution data do not move sharply' and that observations that indicate otherwise mark breaks in consistancy or continuity rather than actual change, a moving-average smoother is applied to the SWIID estimates. 

%Information from the University of Texas Inequality Project (UTIP, see \citealt{Galbraith2009}), the World Top Income Database (WTIP, see \citealt{Atkinson2011}\nocite{Alvaredo2014}), and the estimates of surrounding years are used within each country to generate estimates for years with no Gini data available.  Here, too, the resulting uncertainty is incorporated into the SWIID.  

As a result of this careful attention to the sound advice of \citet{Atkinson2001, Atkinson2009}, the SWIID makes sense of what those authors called the `bewildering variety of estimates' \citep[784]{Atkinson2001} found elsewhere.  Its success in this regard can be seen by comparing its estimates with later releases of data from the LIS.  The exercise reveals an impressive record of out-of-sample prediction: for only 5 of the 71 observations, or 7\%, are the differences between the LIS Key Figures and previously released data from the SWIID substantively and statistically significant \citep[see][16-18]{Solt2014}.  This record is all the more impressive given that the difference between the LIS data and observations carefully selected from the \citet{UNU2014} dataset to maximize their comparability was substantively and statistically significant in 21\% of country-years \citep[18]{Solt2014}.  The SWIID provides the most comparable data possible for the broadest sample of country-years of any cross-national income-inequality dataset.  


\section*{Concluding Thoughts}
Given that it is purportedly based on the recommendations of \citet{Atkinson2001, Atkinson2009}, that the guidance of \citet{Jenkins2015} yields such disastrously bad outcomes could be read as a warning against undertaking any broadly cross-national research on income inequality at all.  There are hints to this effect in the concluding section of (and indeed throughout) the work, and if this is in fact its author's judgment, it would perhaps be an understandable conclusion from the perspective from one whose own work has been using microdata to produce inequality statistics for single countries.  But this perspective is very different from that of those researchers who \emph{do} actually wish to make comparisons across countries and over time.  Just how different is thrown into stark relief by the complaint that the SWIID data for the United States do not show the `sharp discontinuity' between 1992 and 1993 that was caused by the major redesign of the Current Population Survey between those two years \citep[31]{Jenkins2015}.  To those for whom the survey methodology is itself an important object of study, the elimination of such a break is quite naturally viewed as a flaw.  But for anyone seeking to draw inferences regarding income inequality and its relationships to other variables over space and time, the elimination of artifacts that do not reflect actual changes in inequality is unquestionably a feature, not a bug \citep[see, e.g.,][392]{Atkinson2009}.  

It is fortunate then, that as I have shown above, the guidance in \citet{Jenkins2015} bears little resemblance at all to the advice provided by \citet{Atkinson2001, Atkinson2009}.  The procedure used to generate the SWIID, on the other hand, does match this sound advice.  Those pursuing research on income inequality across many countries and over time, as a result, will often find that the SWIID is their best choice of data source.


\pagebreak

\bibliographystyle{ajps}
\bibliography{Response2}

%\end{spacing}

\end{document}